Soap Berry - LTEM - EcoDomain Comparison
1 Summary of Soap Berry LTEM protocol
1.1 Basic protocol
Areas rich in soapberries is located for permanent monitoring. In each area, 10 robust bushes are chosen, and two stems on each plant are chosen for sampling. The stems and bushes are marked with permanent tags so that they can be revisited each year.
There are three measures taken in this protocol.
- Berry count. The number of berries produced on the stem is recorded as an index of soapberry production.
- Stem diameter. The diameter (in millimeters) of the stem near its base is measured.
- Mean berry weight. A collection of 25-50 ripe red berries is obtained in August and weighed so the average wet weight of a single berry from each area is obtained.
If the tagged stem has died (or is damaged or the tag on the stem has “disappeared”), a new stem is chosen. This may be from a new bush or the same bush. If the stem has been browsed, then no count is conducted on this stem this year.
If the tagged bush has died (or the tags on all of the stems have “disappeared”), a new bush is selected for subsequent monitoring.
1.2 Database structure
The relevant fields extracted from the database are:
The relevant fields from the database are:
- STUDY_AREA_NAME. The name of the study area.
- SAMPLE_STATION_LABEL. The bush/stem label.
- Date. The date the data was collected. The Year is extracted from this date.
- Berry_Count. The number of soap berries on this stem. If the stem is browed (or damaged) a missing value should be entered here and not the value of 0.
- Stem_Diameter. The diameter (mm) of the stem.
- Average_Weight. The average weight of a sample of berries is collected. Notice that there is only ONE mean weight found per year and so this value
is replicated on every stem line of the sheet. The sample size used to determine the weight is in a separate column. It is assumed that 0’s have been already imputed on the database.
The longitude and latitude are used to find the EcoDomain of each study area.
2 Reading and checking the data
2.1 Extract all LTEM data
The database was queried for all observations from this protocol in the Province.
The following surveys and years of surveys were found.
Number of records in each study area x year combination
Year
STUDY_AREA_NAME_shrt 2013 2014 2015 2016 2017 2018 2019 2020 2021
Ancient Forest/Chun 0 0 0 0 20 20 0 0 0
Beatton Park 0 20 20 20 0 0 0 0 0
Bowser Ecological Re 20 20 20 20 20 20 20 20 20
Crooked River Park 20 20 20 40 0 0 0 23 0
E.C. Manning Park 20 20 20 0 0 20 20 20 0
Elk Falls Provincial 0 0 0 0 0 20 20 20 20
Eskers Park 0 0 20 20 20 0 0 0 0
Fiordland Conservanc 0 0 0 0 0 0 0 0 20
Golden Ears Park 0 0 20 20 19 20 20 0 0
Goldstream Park 0 0 0 0 20 20 0 0 24
John Dean Park 0 0 20 20 20 20 0 0 20
Mount Robson Park 0 0 20 20 0 20 20 30 22
Mount Seymour Park 0 0 20 20 0 22 0 0 0
Muncho Lake Park 0 0 20 19 0 19 0 19 0
Okanagan Mountain Pa 0 20 0 20 0 20 0 20 0
Purcell Wilderness C 0 0 0 20 20 20 20 20 20
Schoen Lake Park 0 0 20 20 20 20 18 20 0
Skagit Valley Park 0 0 0 20 0 20 20 0 20
Strathcona Park 0 20 0 20 20 20 20 20 20
Tarahne Park 0 0 0 0 0 0 0 20 0
Tatshenshini Alsek P 0 0 20 0 0 0 20 0 0
Yaaguun Gandlaay Her 0 20 20 20 20 25 20 20 24
It is not necessary that every STUDY_AREA be measured in every year and the number of measurements at each STUDY_AREA can vary within years and across years.
If STUDY_AREA is only measured in one year, it does not provide any information on the trends and can be dropped. Similary, STUDY_AREAS with only 2 years of data, provide no information on the uncertainty; they can be retained or dropped.
STUDY_AREAs dropped for insuffient data are:
Study Area | # years of data |
|---|---|
Fiordland Conservancy | 1 |
Tarahne Park | 1 |
2.2 Ecodomain of each study area
The Ecodomain of each study area was obtained by looking up the mean latitude/longitude of observations for each STUDY_AREA in the Ecoprovices provided in the bcmaps package. In some cases, latitude/longitude is not provided as noted below:
Site missing long/lat data
STUDY_AREA_NAME_shrt LONGITUDE_DD LATITUDE_DD
1 Ancient Forest/Chun NaN NaN
3 Bowser Ecological Re NaN NaN
5 E.C. Manning Park NaN NaN
8 Golden Ears Park NaN NaN
13 Muncho Lake Park NaN NaN
14 Okanagan Mountain Pa NaN NaN
15 Purcell Wilderness C NaN NaN
17 Skagit Valley Park NaN NaN
19 Tatshenshini Alsek P NaN NaN
How is it possible that long/lat is not know for some study area?
The classification of each remaining STUDY_AREA into its EcoDomain is:
ECODOMAIN
STUDY_AREA_NAME_shrt CoastMtn HumConHigh North
Beatton Park 0 0 1
Crooked River Park 0 1 0
Elk Falls Provincial 1 0 0
Eskers Park 0 1 0
Goldstream Park 1 0 0
John Dean Park 1 0 0
Mount Robson Park 0 1 0
Mount Seymour Park 1 0 0
Schoen Lake Park 1 0 0
Strathcona Park 1 0 0
Yaaguun Gandlaay Her 1 0 0
# check if at least 2 eco domains at at least 4 study areas and at least 3 years x 2 ecodomains x 4 study area
Records in remaining sites where ECODOMAIN could be determined
ECODOMAIN
STUDY_AREA_NAME_shrt CoastMtn HumConHigh North
Beatton Park 0 0 60
Crooked River Park 0 123 0
Elk Falls Provincial 80 0 0
Eskers Park 0 60 0
Goldstream Park 64 0 0
John Dean Park 100 0 0
Mount Robson Park 0 132 0
Mount Seymour Park 62 0 0
Schoen Lake Park 118 0 0
Strathcona Park 140 0 0
Yaaguun Gandlaay Her 169 0 0
2.3 Checking species code
The species code should be the same across for all data values.
Year
SPECIES_CODE 2013 2014 2015 2016 2017 2018 2019 2020 2021
NULL 0 12 2 5 2 0 0 0 0
SHEPCAN 20 40 98 98 20 20 20 53 22
VACCOVL 0 0 0 20 20 20 18 20 0
VACCPAR 0 28 60 77 78 127 60 60 108
*** WARNING *** More than one species name found - OK if some are NULL
Is only SHEPCAN to be analyzed in this document??
3 Multi-Site Analysis
This design has multiple stems on bushes that are repeatedly measured over time. Please refer to the Fitting Trends with Complex Study Designs document in the CommonFile directory for information on fitting trends with complex study designs.
All analyses were done using the R (R Core Team, 2022) analysis system. All plots are also saved as separate *png files for inclusion into other reports.
3.1 Mean berry weight.
This measurement is taken at the STUDY_AREA_NAME level and so there is one measurement available per STUDY_AREA_NAME.year. Notice that this value is replicated multiple times in the database for each individual stem. These are NOT real replicated readings but only an artifact of the database so some care is needed to extract only a single value per STUDY_AREA_NAME.year.
3.1.1 Summarize to STUDY_AREA_NAME.year level
The data is first summarized to the STUDY_AREA_NAME.year level by extracting a single value for the mean berry weight from the repeated values in the database.
The summary data is:
STUDY_AREA_NAME STUDY_AREA_NAME_shrt ECODOMAIN Year Mean.Berry.Weight..gm.
1 Beatton Park Beatton Park North 2014 0.1380000
2 Beatton Park Beatton Park North 2015 NaN
3 Beatton Park Beatton Park North 2016 NaN
4 Crooked River Park Crooked River Park HumConHigh 2013 0.1687000
5 Crooked River Park Crooked River Park HumConHigh 2014 0.0870000
6 Crooked River Park Crooked River Park HumConHigh 2015 NaN
7 Crooked River Park Crooked River Park HumConHigh 2016 0.1253846
8 Crooked River Park Crooked River Park HumConHigh 2020 1.4782609
9 Elk Falls Provincial Park Elk Falls Provincial CoastMtn 2018 0.2290000
10 Elk Falls Provincial Park Elk Falls Provincial CoastMtn 2019 0.3300000
11 Elk Falls Provincial Park Elk Falls Provincial CoastMtn 2020 0.3750000
12 Elk Falls Provincial Park Elk Falls Provincial CoastMtn 2021 12.6426316
13 Eskers Park Eskers Park HumConHigh 2015 0.1600000
14 Eskers Park Eskers Park HumConHigh 2016 0.1500000
15 Eskers Park Eskers Park HumConHigh 2017 0.1200000
16 Goldstream Park Goldstream Park CoastMtn 2017 0.1500000
17 Goldstream Park Goldstream Park CoastMtn 2018 0.0800000
18 Goldstream Park Goldstream Park CoastMtn 2021 1.6250000
19 John Dean Park John Dean Park CoastMtn 2015 NaN
20 John Dean Park John Dean Park CoastMtn 2016 0.0500000
21 John Dean Park John Dean Park CoastMtn 2017 0.0500000
22 John Dean Park John Dean Park CoastMtn 2018 0.0270000
23 John Dean Park John Dean Park CoastMtn 2021 1.6500000
24 Mount Robson Park Mount Robson Park HumConHigh 2015 NaN
25 Mount Robson Park Mount Robson Park HumConHigh 2016 NaN
26 Mount Robson Park Mount Robson Park HumConHigh 2018 NaN
27 Mount Robson Park Mount Robson Park HumConHigh 2019 0.2700000
28 Mount Robson Park Mount Robson Park HumConHigh 2020 2.0666667
29 Mount Robson Park Mount Robson Park HumConHigh 2021 1.0909091
30 Mount Seymour Park Mount Seymour Park CoastMtn 2015 0.0480000
31 Mount Seymour Park Mount Seymour Park CoastMtn 2016 0.3400000
32 Mount Seymour Park Mount Seymour Park CoastMtn 2018 1.6363636
33 Schoen Lake Park Schoen Lake Park CoastMtn 2015 NaN
34 Schoen Lake Park Schoen Lake Park CoastMtn 2016 1.5500000
35 Schoen Lake Park Schoen Lake Park CoastMtn 2017 NaN
36 Schoen Lake Park Schoen Lake Park CoastMtn 2018 1.3500000
37 Schoen Lake Park Schoen Lake Park CoastMtn 2019 2.0555556
38 Schoen Lake Park Schoen Lake Park CoastMtn 2020 1.7500000
39 Strathcona Park Strathcona Park CoastMtn 2014 NaN
40 Strathcona Park Strathcona Park CoastMtn 2016 0.3500000
41 Strathcona Park Strathcona Park CoastMtn 2017 0.0496000
42 Strathcona Park Strathcona Park CoastMtn 2018 1.0000000
43 Strathcona Park Strathcona Park CoastMtn 2019 2.5000000
44 Strathcona Park Strathcona Park CoastMtn 2020 1.5000000
45 Strathcona Park Strathcona Park CoastMtn 2021 1.7500000
46 Yaaguun Gandlaay Heritage Site Yaaguun Gandlaay Her CoastMtn 2014 0.3000000
47 Yaaguun Gandlaay Heritage Site Yaaguun Gandlaay Her CoastMtn 2015 0.1400000
48 Yaaguun Gandlaay Heritage Site Yaaguun Gandlaay Her CoastMtn 2016 0.1900000
49 Yaaguun Gandlaay Heritage Site Yaaguun Gandlaay Her CoastMtn 2017 0.2600000
50 Yaaguun Gandlaay Heritage Site Yaaguun Gandlaay Her CoastMtn 2018 1.9600000
51 Yaaguun Gandlaay Heritage Site Yaaguun Gandlaay Her CoastMtn 2019 1.3500000
52 Yaaguun Gandlaay Heritage Site Yaaguun Gandlaay Her CoastMtn 2020 1.2500000
53 Yaaguun Gandlaay Heritage Site Yaaguun Gandlaay Her CoastMtn 2021 1.5789474
Why is mean berry weight missing for so many site years?
3.1.2 Preliminary plot
A summary plot of the mean berry weight in each year for each STUDY_AREA and an the trends over time is shown in Figure 1.
Strange value in Coast Mountains ECODOMAIN that needs investigation
We notice that the North EcoDomain only has a single site with a single year and so cannot be used in the analysis. In general, each EcoDomain needs at least one site with at least 3 years of data.
Following ECODOMAINS removed because not enought sites with enough years
ECODOMAIN n.sites.at.least.3.years
3 North 0
3.1.3 Model
A non-parallel slope model is fit allowing for a different average slope (over the multiple STUDY_AREAs) in each EcoDomains (non-parallel slopes). Within each EcoDomain, each STUDY_AREA’s slope is allowed to vary randomly around the average slope for the EcoDomain. Within each STUDY_AREA, each transect is allowed to have a different intercept but common slope. Finally, we allow for year-specific factor within each EcoDomain.
The analysis is done on the logarithmic scale so that trends over have a simple interpretation.
The model, in a short-hand notation is:
\[log(MeanBerryWeight) \sim \mathit{EcoDomain} + \mathit{Year} + \mathit{EcoDomain}:\mathit{Year}+ \mathit{StudyArea} + \mathit{StudyArea}:\mathit{Year(R)} + \mathit{YearF:EcoDomain(R)}\]
where
- \(log(MeanBerryWeight)\) is logarithm of the mean berry weight in that year for a STUDY_AREA.
- \(\mathit{Year}\) term represents the average (over all EcoDomains) calendar year trend over time.
- \(\mathit{EcoDomain}\) term represents a different intercept for each EcoDomain
- \(\mathit{EcoDomain}:\mathit{Year}\) term represents the differential average slope for each EcoDomain
- \(\mathit{StudyArea}:\mathit{Year(R)}\) term represents the random slopes within each EcoDomain for each study area
- \(\mathit{YearF:EcoDomain(R)}\) represents the (random) year-specific effects (process error), thare are allowed to vary across EcoDomains.
The \(\mathit{YearF:EcoDomain}\) term represent the year-specific effects (process error) caused by environmental factors (e.g., a warmer than normal year may yield larger berries, on average).
Model fit on the logarithmic scale assume that effects are multiplicative over time, so that the when the actual fit is done on the logarithmic scale, the trends are linear. For example, a trend may assume that there is constant 5% change over time rather than a fixed 1-unit change per year.
The model was fit using the lmerTest() function (Kuznetsova, 2017) in R (R Core Team, 2023).
boundary (singular) fit: see help('isSingular')
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Year 21.4024 21.4024 1 23.874 25.9609 3.315e-05 ***
ECODOMAINF 0.7112 0.7112 1 23.877 0.8627 0.3623
Year:ECODOMAINF 0.7114 0.7114 1 23.875 0.8629 0.3622
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: log(Mean.Berry.Weight..gm.) ~ Year + ECODOMAINF + Year:ECODOMAINF + (1 | STUDY_AREA_NAMEF) + (Year - 1 | STUDY_AREA_NAME) + (1 | YearF:ECODOMAINF)
Data: berry.weight
REML criterion at convergence: 125.4
Scaled residuals:
Min 1Q Median 3Q Max
-1.96500 -0.58151 -0.02038 0.54093 2.07313
Random effects:
Groups Name Variance Std.Dev.
YearF:ECODOMAINF (Intercept) 0.04882 0.2210
STUDY_AREA_NAME Year 0.00000 0.0000
STUDY_AREA_NAMEF (Intercept) 0.43231 0.6575
Residual 0.82441 0.9080
Number of obs: 42, groups: YearF:ECODOMAINF, 16; STUDY_AREA_NAME, 10; STUDY_AREA_NAMEF, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) -1018.0709 193.1253 9.0659 -5.272 0.000501 ***
Year 0.5041 0.0957 9.0619 5.268 0.000504 ***
ECODOMAINFHumConHigh 313.6068 337.6401 23.8771 0.929 0.362278
Year:ECODOMAINFHumConHigh -0.1555 0.1674 23.8747 -0.929 0.362229
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) Year ECODOM
Year -1.000
ECODOMAINFH -0.572 0.572
Y:ECODOMAIN 0.572 -0.572 -1.000
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
3.1.4 Test for no difference in trends among EcoDomains
After the model is fit, the ANOVA tables summarizes the test for no differences in trends among the EcoDomains in Table 1.
Source | NumDF | DenDF | p.value |
|---|---|---|---|
Year | 1 | 23.9 | p < .001 |
ECODOMAIN | 1 | 23.9 | p = 0.362 |
Year:ECODOMAIN | 1 | 23.9 | p = 0.362 |
The Year:ECODOMAIN term tests the hypothesis that the slopes in all of the EcoDomains are equal. If the p-value is large, then there is no evidence that the slopes among the EcoDomains are different.
3.1.5 Estimated slopes
The estimated coefficients from R output above do not have a direct interpretation because of the way that R codes the design matrix for discrete variables such as the EcoDomain.These coefficients associated with each EcoDomain are the difference in the slopes between the slope for that particular EcoDomain and the reference EcoDomain (typically the EcoDomain that is “first” alphabetically).
The estimated slope for each EcoDomain are estimated using the emmeans package (Lenth, 2023) presented in Table 2.
Eco Domain | Slope | SE | df | 95% LCL | 95% UCL | Grouping |
|---|---|---|---|---|---|---|
HumConHigh | 0.3486 | 0.1373 | 34.2 | 0.0697 | 0.6276 | 1 |
CoastMtn | 0.5041 | 0.0957 | 9.1 | 0.2878 | 0.7203 | 1 |
Degrees-of-freedom method: satterthwaite | ||||||
Confidence level used: 0.95 | ||||||
Degrees-of-freedom method: satterthwaite | ||||||
significance level used: alpha = 0.05 | ||||||
NOTE: If two or more means share the same grouping symbol, | ||||||
The estimated trends (on the logarithmic scale) can be interpretted as the proportional change per year. For example, the slope of 0.504 in the CoastMtn EcoDomain can be interpretted as an approximate 50.4% change in the mean berry weight per year.
The Grouping column indicates if there is evidence of a difference among the slopes across EcoDomains. EcoDomains that contain the same “digit” would indicate no evidence of a difference in the mean slopes among the EcoDomains. This is a summary of the overall p-value for parallelism found in the ANOVA table of p = 0.362. For more information on interpretting the .group variable, refer to https://schmidtpaul.github.io/DSFAIR/compactletterdisplay.html.
3.1.6 Mean slope over all EcoDomains
The average slope over all EcoDomains (giving each EcoDomain equal weight regardless of the number of study areas in the EcoDomain) shown in Table 3.
Mean slope | Slope | SE | df | 95% LCL | 95% UCL | p.value |
|---|---|---|---|---|---|---|
overall | 0.4264 | 0.0837 | 23.9 | 0.2536 | 0.5991 | p < .001 |
Results are averaged over the levels of: ECODOMAINF | ||||||
Degrees-of-freedom method: satterthwaite | ||||||
Confidence level used: 0.95 | ||||||
The overall average slope over all EcoDomain is 0.426 (SE 0.084 with a p-value of p < .001. This is the same p-value from the ANOVA table for the Year term.
The estimated mean slope is interpreted in the same way as the slopes for each EcoDomain.
3.1.7 Summary plot
Figure 2 shows a summary plot, along with estimates of the slope, its standard error, and the p-value of the hypothesis of no trend in each EcoDomain.
3.1.8 Estimated variance component
The estimated variance components are shown in Table 4.
Component | Interaction | SD |
|---|---|---|
YearF:ECODOMAIN | 0.221 | |
STUDY_AREA_NAME | Year | 0.000 |
STUDY_AREA_NAME | 0.658 | |
Residual | 0.908 |
The STUDY_AREA_NAME and STUDY_AREA_NAME:Year represent the variation in the intercepts and slopes among study areas within an EcoDomain. The YearF:ECODOMAIN components represents the variation in the year specific factors (process error) within the EcoDomains. Finally the Residual component represents the left-over, unexplained variation in the data.
Notice in this case, that the variation in the slopes for each STUDY_AREA within the EcoDomains is very close to 0, i.e., the different study areas could all have similar slopes within the EcoDomain. If you examine the individual slopes in each study area (see the individual study area reports), the estimated standard error is quite large indicating the the slopes could be roughly the same across study areas (see Figure 3).
The year specific effects vary slightly among the EcoDomains as shown in Figure 4.
We see that the year-specific effects can vary across EcoDomains, e.g., the year-specific impact of weather can be different in the different EcoDomains. This is not surprising because B.C. is a large Province!
3.1.9 Residual plots
Residual plots are presented in Figure 5.
In the upper left corner is a plot of residuals vs. the fitted values. A good plot will show a random scatter around 0. Any large deviations from 0 should be investigated as potential outliers. In the upper right is a normal probability plot. Points should be close to the dashed reference line. Fortunately, the analysis is fairly robust against non-normality so only extreme departures are worrisome. Caterpiller plots attempt to show the distribution of the random effects. The bottom left plot shows the distribution of the transect effects. The bottom right plot shows the distribution of the year-specific effects (process variation). In this case, the estimated process variation is very small with most of points very close to 0.
3.2 Stem Diameter.
This measurement is taken at the stem level and so there is one value per stem/bush/year. The same stem is repeatedly measured over time, but stems may leave the protocol (damaged or dead) or be added to the protocol (replacement stem) over time. All of the models below automatically will account for stems that are removed or added as long as each stem has a unique label within a site.
The first few records are:
STUDY_AREA_NAME_shrt ECODOMAIN Year BUSH STEM Branch.Diameter..mm.
1 Beatton Park North 2015 plant2 plant2-branch34 9.5
2 Beatton Park North 2016 plant2 plant2-branch35 4.2
3 Beatton Park North 2014 plant2 plant2-branch35 7.3
4 Beatton Park North 2016 plant4 plant4-branch39 4.8
5 Beatton Park North 2014 plant2 plant2-branch34 5.5
6 Beatton Park North 2016 plant5 plant5-branch41 9.4
Missing values are excluded. If there are some records with a reported diameter of 0 – these likely need to be checked.
Number of records prior to removing NA
[1] 1108 129
Number of records after to removing NA
[1] 1042 129
Record with 0 branch diameter that need attending
STUDY_AREA_NAME Year BUSH STEM Branch.Diameter..mm.
775 Schoen Lake Park 2015 plant10 plant10-branch19 0
3.2.1 Preliminary plot
A summary plot of the mean stem diameter in each year for each STUDY_AREA and an the trends over time is shown in Figure 6.
Need to check out the odd value for Coast Mountain stem diameter.
3.2.2 Model
A non-parallel slope model is fit allowing for a different average slope (over the multiple STUDY_AREAs) in each EcoDomains (non-parallel slopes). Within each EcoDomain, each STUDY_AREA’s slope is allowed to vary randomly around the average slope for the EcoDomain. Within each STUDY_AREA, each BUSH and STEM is allowed to have a different intercept but common slope. Finally, we allow for year-specific factor within each EcoDomain.
The model, in a short-hand notation is:
\[log(BranchDiameter) \sim \mathit{EcoDomain} + \mathit{Year} + \mathit{EcoDomain}:\mathit{Year}+ \mathit{StudyArea} + \mathit{Bush(R)} +\mathit{Stem(R)} + \mathit{StudyArea}:\mathit{Year(R)} + \mathit{YearF:EcoDomain(R)}\]
where
- \(log(BranchDiameter)\) is logarithm of the branch diameter on that bush and stem in that year.
- \(\mathit{Year}\) term represents the average (over all EcoDomains) calendar year trend over time.
- \(\mathit{EcoDomain}\) term represents a different intercept for each EcoDomain
- \(\mathit{EcoDomain}:\mathit{Year}\) term represents the differential average slope for each EcoDomain
- \(\mathit{StudyArea}:\mathit{Year(R)}\) term represents the random slopes within each EcoDomain for each study area
- \(\mathit{Stem(R)}\) represents the (random) stem effect;
- \(\mathit{Bush(R)}\) represents the (random) bush effect;
- \(\mathit{YearF:EcoDomain(R)}\) represents the (random) year-specific effects (process error), thare are allowed to vary across EcoDomains.
The The \(\mathit{Stem(R)}\) and \(\mathit{Bush(R)}\) terms allows for the fact that bush and stem specific conditions may tend to affect the branch diameter consistently over time. The \(\mathit{YearF:EcoDomain}\) term represent the year-specific effects (process error) caused by environmental factors (e.g., a warmer than normal year may lead to larger branch diameters).
Model fit on the logarithmic scale assume that effects are multiplicative over time, so that the when the actual fit is done on the logarithmic scale, the trends are linear. For example, a trend may assume that there is constant 5% change over time rather than a fixed 1-unit change per year.
The model was fit using the lmerTest() function (Kuznetsova, 2017) in R (R Core Team, 2023).
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Year 0.007614 0.0076137 1 8.6927 0.0570 0.8169
ECODOMAINF 0.030907 0.0154537 2 8.3902 0.1156 0.8922
Year:ECODOMAINF 0.034967 0.0174836 2 8.5302 0.1308 0.8791
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: log(Branch.Diameter..mm.) ~ Year + ECODOMAINF + Year:ECODOMAINF + (1 | STUDY_AREA_NAMEF) + (Year - 1 | STUDY_AREA_NAME) + (1 | BUSHF) + (1 | STEMF) + (1 | YearF:ECODOMAINF)
Data: stemd.df
REML criterion at convergence: 1237.8
Scaled residuals:
Min 1Q Median 3Q Max
-3.7633 -0.4212 -0.0145 0.4129 11.9217
Random effects:
Groups Name Variance Std.Dev.
STEMF (Intercept) 0.01706 0.1306
BUSHF (Intercept) 0.03129 0.1769
YearF:ECODOMAINF (Intercept) 0.01826 0.1351
STUDY_AREA_NAME Year 0.20597 0.4538
STUDY_AREA_NAMEF (Intercept) 63.92096 7.9951
Residual 0.13367 0.3656
Number of obs: 1041, groups: STEMF, 296; BUSHF, 126; YearF:ECODOMAINF, 20; STUDY_AREA_NAME, 11; STUDY_AREA_NAMEF, 11
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3.557824 3.051622 8.233804 1.166 0.276
Year -0.096638 0.173193 8.240467 -0.558 0.592
ECODOMAINFHumConHigh -2.654857 5.565536 8.197576 -0.477 0.646
ECODOMAINFNorth -0.279352 8.722323 8.580866 -0.032 0.975
Year:ECODOMAINFHumConHigh 0.158359 0.316163 8.233350 0.501 0.630
Year:ECODOMAINFNorth -0.003138 0.498551 8.828143 -0.006 0.995
Correlation of Fixed Effects:
(Intr) Year ECODOMAINFH ECODOMAINFN Y:ECODOMAINFH
Year -0.019
ECODOMAINFH -0.548 0.010
ECODOMAINFN -0.350 0.007 0.192
Y:ECODOMAINFH 0.010 -0.548 -0.018 -0.004
Y:ECODOMAINFN 0.007 -0.347 -0.004 -0.046 0.190
3.2.3 Test for no difference in trends among EcoDomains
After the model is fit, the ANOVA tables summarizes the test for no differences in trends among the EcoDomains in Table 5.
Source | NumDF | DenDF | p.value |
|---|---|---|---|
Year | 1 | 8.7 | p = 0.817 |
ECODOMAIN | 2 | 8.4 | p = 0.892 |
Year:ECODOMAIN | 2 | 8.5 | p = 0.879 |
The Year:ECODOMAIN term tests the hypothesis that the slopes in all of the EcoDomains are equal. If the p-value is large, then there is no evidence that the slopes among the EcoDomains are different.
3.2.4 Estimated slopes
The estimated coefficients from R output above do not have a direct interpretation because of the way that R codes the design matrix for discrete variables such as the EcoDomain.These coefficients associated with each EcoDomain are the difference in the slopes between the slope for that particular EcoDomain and the reference EcoDomain (typically the EcoDomain that is “first” alphabetically).
The estimated slope for each EcoDomain are estimated using the emmeans package (Lenth, 2023) presented in Table 6.
Eco Domain | Slope | SE | df | 95% LCL | 95% UCL | Grouping |
|---|---|---|---|---|---|---|
North | -0.0998 | 0.4675 | 8.9 | -1.1589 | 0.9594 | 1 |
CoastMtn | -0.0966 | 0.1732 | 8.2 | -0.4940 | 0.3007 | 1 |
HumConHigh | 0.0617 | 0.2645 | 8.2 | -0.5453 | 0.6687 | 1 |
Degrees-of-freedom method: satterthwaite | ||||||
Confidence level used: 0.95 | ||||||
Degrees-of-freedom method: satterthwaite | ||||||
P value adjustment: tukey method for comparing a family of 3 estimates | ||||||
significance level used: alpha = 0.05 | ||||||
NOTE: If two or more means share the same grouping symbol, | ||||||
The estimated trends (on the logarithmic scale) can be interpretted as the proportional change per year. For example, the slope of -0.097 in the CoastMtn EcoDomain can be interpretted as an approximate -9.7% change in the mean call rate per year.
The Grouping column indicates if there is evidence of a difference among the slopes across EcoDomains. EcoDomains that contain the same “digit” would indicate no evidence of a difference in the mean slopes among the EcoDomains. This is a summary of the overall p-value for parallelism found in the ANOVA table of p = 0.879. For more information on interpretting the .group variable, refer to https://schmidtpaul.github.io/DSFAIR/compactletterdisplay.html.
3.2.5 Mean slope over all EcoDomains
The average slope over all EcoDomains (giving each EcoDomain equal weight regardless of the number of study areas in the EcoDomain) shown in Table 7.
Mean slope | Slope | SE | df | 95% LCL | 95% UCL | p.value |
|---|---|---|---|---|---|---|
overall | -0.0449 | 0.1881 | 8.7 | -0.4728 | 0.3830 | p = 0.817 |
Results are averaged over the levels of: ECODOMAINF | ||||||
Degrees-of-freedom method: satterthwaite | ||||||
Confidence level used: 0.95 | ||||||
The overall average slope over all EcoDomain is -0.045 (SE 0.188 with a p-value of p = 0.817. This is the same p-value from the ANOVA table for the Year term.
The estimated mean slope is interpreted in the same way as the slopes for each EcoDomain.
3.2.6 Summary plot
Figure 7 shows a summary plot, along with estimates of the slope, its standard error, and the p-value of the hypothesis of no trend in each EcoDomain.
3.2.7 Estimated variance component
The estimated variance components are shown in Table 8.
Component | Interaction | SD |
|---|---|---|
STEM | 0.131 | |
BUSH | 0.177 | |
YearF:ECODOMAIN | 0.135 | |
STUDY_AREA_NAME | Year | 0.454 |
STUDY_AREA_NAME | 7.995 | |
Residual | 0.366 |
The STUDY_AREA_NAME and STUDY_AREA_NAME:Year represent the variation in the intercepts and slopes among study areas within an EcoDomain. The BUSH and STEM components represents the variation in intercepts among the bushes and stems within the same STUDY_AREA. The YearF:ECODOMAIN components represents the variation in the year specific factors (process error) within the EcoDomains. Finally the Residual component represents the left-over, unexplained variation in the data.
Notice in this case, that the variation in the slopes for each STUDY_AREA within the EcoDomains is very close to 0, i.e., the different study areas could all have similar slopes within the EcoDomain. If you examine the individual slopes in each study area (see the individual study area reports), the estimated standard error is quite large indicating the the slopes could be roughly the same across study areas (see Figure 8).
The year specific effects vary slightly among the EcoDomains as shown in Figure 9.
We see that the year-specific effects can vary across EcoDomains, e.g., the year-specific impact of weather can be different in the different EcoDomains. This is not surprising because B.C. is a large Province!
3.2.8 Residual plots
Residual plots are presented in Figure 10.
In the upper left corner is a plot of residuals vs. the fitted values. A good plot will show a random scatter around 0. Any large deviations from 0 should be investigated as potential outliers. In the upper right is a normal probability plot. Points should be close to the dashed reference line. Fortunately, the analysis is fairly robust against non-normality so only extreme departures are worrisome. Caterpiller plots attempt to show the distribution of the random effects. The bottom left plot shows the distribution of the transect effects. The bottom right plot shows the distribution of the year-specific effects (process variation). In this case, the estimated process variation is very small with most of points very close to 0.
3.3 Berry Count.
This measurement is taken at the stem level and so there is one value per stem/plant/year. The same stem/plant is repeated measured over time. All of the models below automatically will account for branches that are removed or added as long as each stem has a unique label within a site. The models for the berry count are similar to those from the stem diameter except that the counts may be somewhat smallish. The average count is about 5 or less, then a Poisson regression can, in theory, be used. However, if there are several random effects, then a Poisson mixed effects model is extremely difficult to fit. However, for larger counts, a linear mixed model on the log(counts+0.5) will work well and avoidd many of the problems in dealing with generalized linear mixed models.
The first few records are:
STUDY_AREA_NAME_shrt ECODOMAIN Year BUSH STEM Berry.Count
1 Beatton Park North 2015 plant2 plant2-branch34 2
2 Beatton Park North 2016 plant2 plant2-branch35 4
3 Beatton Park North 2014 plant2 plant2-branch35 18
4 Beatton Park North 2016 plant4 plant4-branch39 54
5 Beatton Park North 2014 plant2 plant2-branch34 24
6 Beatton Park North 2016 plant5 plant5-branch41 46
Missing values are excluded.
Number of records prior to removing NA
[1] 1108 129
Number of records after to removing NA
[1] 1065 129
3.3.1 Preliminary plot
A summary plot of the mean berry count in each year for each STUDY_AREA and an the trends over time is shown in Figure 11.
3.3.2 Model
A non-parallel slope model is fit allowing for a different average slope (over the multiple STUDY_AREAs) in each EcoDomains (non-parallel slopes). Within each EcoDomain, each STUDY_AREA’s slope is allowed to vary randomly around the average slope for the EcoDomain. Within each STUDY_AREA, each BUSH and STEM is allowed to have a different intercept but common slope. Finally, we allow for year-specific factor within each EcoDomain.
We add an offset of 0.5 to avoid taking logarithms of 0.
The model, in a short-hand notation is:
\[log(BerryCount+0.5) \sim \mathit{EcoDomain} + \mathit{Year} + \mathit{EcoDomain}:\mathit{Year}+ \mathit{StudyArea} + \mathit{Bush(R)} +\mathit{Stem(R)} + \mathit{StudyArea}:\mathit{Year(R)} + \mathit{YearF:EcoDomain(R)}\]
where
- \(log(BerryCount+0.05)\) is logarithm of the berry count on that bush and stem in that year. An offset of 0.5 is added to avoid taking logarithms of 0.
- \(\mathit{Year}\) term represents the average (over all EcoDomains) calendar year trend over time.
- \(\mathit{EcoDomain}\) term represents a different intercept for each EcoDomain
- \(\mathit{EcoDomain}:\mathit{Year}\) term represents the differential average slope for each EcoDomain
- \(\mathit{StudyArea}:\mathit{Year(R)}\) term represents the random slopes within each EcoDomain for each study area
- \(\mathit{Stem(R)}\) represents the (random) stem effect;
- \(\mathit{Bush(R)}\) represents the (random) bush effect;
- \(\mathit{YearF:EcoDomain(R)}\) represents the (random) year-specific effects (process error), thare are allowed to vary across EcoDomains.
The The \(\mathit{Stem(R)}\) and \(\mathit{Bush(R)}\) terms allows for the fact that bush and stem specific conditions may tend to affect the berry count consistently over time. The \(\mathit{YearF:EcoDomain}\) term represent the year-specific effects (process error) caused by environmental factors (e.g., a warmer than normal year may lead to more berries in all bushes and stems).
Model fit on the logarithmic scale assume that effects are multiplicative over time, so that the when the actual fit is done on the logarithmic scale, the trends are linear. For example, a trend may assume that there is constant 5% change over time rather than a fixed 1-unit change per year. Some caution is needed if any of the values are 0 as log(0) is not defined. In these cases, a small constant (typically ½ of the smallest positive value in the dataset) is added to all values before the analysis proceeds.
The model was fit using the lmerTest() function (Kuznetsova, 2017) in R (R Core Team, 2023).
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Year 0.01873 0.01873 1 13.566 0.0107 0.9191
ECODOMAINF 2.63346 1.31673 2 12.374 0.7533 0.4912
Year:ECODOMAINF 2.44830 1.22415 2 11.280 0.7003 0.5168
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: log(Berry.Count + 0.5) ~ Year + ECODOMAINF + Year:ECODOMAINF + (1 | STUDY_AREA_NAMEF) + (Year - 1 | STUDY_AREA_NAME) + (1 | BUSHF) + (1 | STEMF) + (1 | YearF:ECODOMAINF)
Data: bcount.df
REML criterion at convergence: 3946.7
Scaled residuals:
Min 1Q Median 3Q Max
-2.93625 -0.72864 0.03296 0.72011 2.84822
Random effects:
Groups Name Variance Std.Dev.
STEMF (Intercept) 0.3093 0.5561
BUSHF (Intercept) 0.1744 0.4176
YearF:ECODOMAINF (Intercept) 1.3294 1.1530
STUDY_AREA_NAME Year 0.1812 0.4256
STUDY_AREA_NAMEF (Intercept) 54.2329 7.3643
Residual 1.7480 1.3221
Number of obs: 1065, groups: STEMF, 301; BUSHF, 126; YearF:ECODOMAINF, 20; STUDY_AREA_NAME, 11; STUDY_AREA_NAMEF, 11
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3.25485 4.26309 12.97403 0.763 0.459
Year -0.08658 0.24334 11.62430 -0.356 0.728
ECODOMAINFHumConHigh -7.59966 6.77161 10.91995 -1.122 0.286
ECODOMAINFNorth 4.44935 15.25021 14.25908 0.292 0.775
Year:ECODOMAINFHumConHigh 0.42759 0.39149 9.70779 1.092 0.301
Year:ECODOMAINFNorth -0.27360 0.97463 13.47607 -0.281 0.783
Correlation of Fixed Effects:
(Intr) Year ECODOMAINFH ECODOMAINFN Y:ECODOMAINFH
Year -0.563
ECODOMAINFH -0.630 0.355
ECODOMAINFN -0.280 0.158 0.176
Y:ECODOMAINFH 0.350 -0.622 -0.433 -0.098
Y:ECODOMAINFN 0.141 -0.250 -0.089 -0.755 0.155
3.3.3 Test for no difference in trends among EcoDomains
After the model is fit, the ANOVA tables summarizes the test for no differences in trends among the EcoDomains in Table 9.
Source | NumDF | DenDF | p.value |
|---|---|---|---|
Year | 1 | 13.6 | p = 0.919 |
ECODOMAIN | 2 | 12.4 | p = 0.491 |
Year:ECODOMAIN | 2 | 11.3 | p = 0.517 |
The Year:ECODOMAIN term tests the hypothesis that the slopes in all of the EcoDomains are equal. If the p-value is large, then there is no evidence that the slopes among the EcoDomains are different.
3.3.4 Estimated slopes
The estimated coefficients from R output above do not have a direct interpretation because of the way that R codes the design matrix for discrete variables such as the EcoDomain.These coefficients associated with each EcoDomain are the difference in the slopes between the slope for that particular EcoDomain and the reference EcoDomain (typically the EcoDomain that is “first” alphabetically).
The estimated slope for each EcoDomain are estimated using the emmeans package (Lenth, 2023) presented in Table 10.
Eco Domain | Slope | SE | df | 95% LCL | 95% UCL | Grouping |
|---|---|---|---|---|---|---|
North | -0.3602 | 0.9438 | 13.5 | -2.3919 | 1.6715 | 1 |
CoastMtn | -0.0866 | 0.2433 | 11.6 | -0.6187 | 0.4455 | 1 |
HumConHigh | 0.3410 | 0.3067 | 8.3 | -0.3615 | 1.0435 | 1 |
Degrees-of-freedom method: satterthwaite | ||||||
Confidence level used: 0.95 | ||||||
Note: contrasts are still on the 0.5(mu + 0.5) scale | ||||||
Degrees-of-freedom method: satterthwaite | ||||||
P value adjustment: tukey method for comparing a family of 3 estimates | ||||||
significance level used: alpha = 0.05 | ||||||
NOTE: If two or more means share the same grouping symbol, | ||||||
The estimated trends (on the logarithmic scale) can be interpretted as the proportional change per year. For example, the slope of -0.087 in the CoastMtn EcoDomain can be interpretted as an approximate -8.7% change in the mean call rate per year.
The Grouping column indicates if there is evidence of a difference among the slopes across EcoDomains. EcoDomains that contain the same “digit” would indicate no evidence of a difference in the mean slopes among the EcoDomains. This is a summary of the overall p-value for parallelism found in the ANOVA table of p = 0.517. For more information on interpretting the .group variable, refer to https://schmidtpaul.github.io/DSFAIR/compactletterdisplay.html.
3.3.5 Mean slope over all EcoDomains
The average slope over all EcoDomains (giving each EcoDomain equal weight regardless of the number of study areas in the EcoDomain) shown in Table 11.
Mean slope | Slope | SE | df | 95% LCL | 95% UCL | p.value |
|---|---|---|---|---|---|---|
overall | -0.0353 | 0.3406 | 13.6 | -0.7679 | 0.6974 | p = 0.919 |
Results are averaged over the levels of: ECODOMAINF | ||||||
Degrees-of-freedom method: satterthwaite | ||||||
Confidence level used: 0.95 | ||||||
The overall average slope over all EcoDomain is -0.035 (SE 0.341 with a p-value of p = 0.919. This is the same p-value from the ANOVA table for the Year term.
The estimated mean slope is interpreted in the same way as the slopes for each EcoDomain.
3.3.6 Summary plot
Figure 12 shows a summary plot, along with estimates of the slope, its standard error, and the p-value of the hypothesis of no trend in each EcoDomain.
3.3.7 Estimated variance component
The estimated variance components are shown in Table 12.
Component | Interaction | SD |
|---|---|---|
STEM | 0.556 | |
BUSH | 0.418 | |
YearF:ECODOMAIN | 1.153 | |
STUDY_AREA_NAME | Year | 0.426 |
STUDY_AREA_NAME | 7.364 | |
Residual | 1.322 |
The STUDY_AREA_NAME and STUDY_AREA_NAME:Year represent the variation in the intercepts and slopes among study areas within an EcoDomain. The BUSH and STEM components represents the variation in intercepts among the bushes and stems within the same STUDY_AREA representing bush- and stem-specific factors that affect the berry count. The YearF:ECODOMAIN components represents the variation in the year specific factors (process error) within the EcoDomains. Finally the Residual component represents the left-over, unexplained variation in the data.
Notice in this case, that the variation in the slopes for each STUDY_AREA within the EcoDomains is very close to 0, i.e., the different study areas could all have similar slopes within the EcoDomain. If you examine the individual slopes in each study area (see the individual study area reports), the estimated standard error is quite large indicating the the slopes could be roughly the same across study areas (see Figure 13).
The year specific effects vary slightly among the EcoDomains as shown in Figure 14.
We see that the year-specific effects can vary across EcoDomains, e.g., the year-specific impact of weather can be different in the different EcoDomains. This is not surprising because B.C. is a large Province!
3.3.8 Residual plots
Residual plots are presented in Figure 15.
In the upper left corner is a plot of residuals vs. the fitted values. A good plot will show a random scatter around 0. Any large deviations from 0 should be investigated as potential outliers. In the upper right is a normal probability plot. Points should be close to the dashed reference line. Fortunately, the analysis is fairly robust against non-normality so only extreme departures are worrisome. Caterpiller plots attempt to show the distribution of the random effects. The bottom left plot shows the distribution of the transect effects. The bottom right plot shows the distribution of the year-specific effects (process variation). In this case, the estimated process variation is very small with most of points very close to 0.
4 Power/sample size analysis
There are two hypotheses of most interest:
- What is the mean trend over all ecoregions. This is the Year effect test previously seen.
- Are there differences in the trend among ecoregions? This is the Year:Ecoregion test previously seen.
Because the model has many random effects, the simr package (Green and McLeod 2016) will be used to estimate power at various sample sizes (number of years of monitoring). The simr package estimates the power by simulating many datasets with the same set of random effects, doing a model fit on each simulated dataset, and computing the proportion of the simulations where the p-value for the effect of interest is less than the \(\alpha=0.5\) level to estimate the power to detect this effect.
The berry protocol measures mean berry weight, number of berries, and mean stem diameter. The individual site analyses showed that berry counts are highly variable and so any power to detect trends will be small. Conversely, the stem diameter values have very little variation over time. Consequently, we will only do a power analysis on the mean berry weight response.
4.1 Power to detect mean trend.
We will first fit a simpler model that drops the Year:Ecoregion term in the model and fit a model with parallel slopes across the ecoregions (Figure 16).
The lines may appear to be “wiggly” or decline when moving from left to right, but this is an aretefact of a small number of simulations.
The Year Effect is the proportional change/year that is of interest. For example, a Year Effect of .02, implies a 2% change/year.
A target power of .80 (when alpha=0.05) is recommended (horizontal line on plots). This may not be achieved for small year effects with less than 10 year of sampling.
4.2 Power to detect differences in trends among EcoRegions.
We are now interested in the Year:Ecoregion term in the model and need to modify the size of this interaction. We will use the current model’s overall trend. Results are shown in Figure 17).
The lines may appear to be “wiggly” or decline when moving from left to right, but this is an aretefact of a small number of simulations.
The Diff trend is the DIFFERENCE in proportional change/year among the EcoRegions. For example, a Diff trend of 0 implies that all Ecoregions have the same trend. A Diff trend of .02, implies that the trends across the EcoRegions differs by 2 percentage points/year.
A target power of .80 (when alpha=0.05) is recommended (horizontal line on plots). This may not be achieved for small differential effects with less than 10 year of sampling.
5 Summary
This analysis examines trends across EcoDomains. The model has an overall average trend, a EcoDomain specific trend, and random trends for each Study Area among the EcoDomain trends. With many study areas and many years of data collected in each study area, the design has a high power to detect even a moderate trend over time.
6 References
Kuznetsova A, Brockhoff PB, Christensen RHB (2017). “lmerTest Package: Tests in Linear Mixed Effects Models.” Journal of Statistical Software, 82, 1-26. doi:10.18637/jss.v082.i13 https://doi.org/10.18637/jss.v082.i13.
Lenth R (2023). emmeans: Estimated Marginal Means, aka Least-Squares Means. R package version 1.8.4-1, https://CRAN.R-project.org/package=emmeans.
R Core Team (2032). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.